A Discovery Procedure For Certain Phonological Rules
نویسنده
چکیده
Acquisi t ion of phonological sys t ems can be insightfully s tudied in t e rms of discovery procedures. This paper describes a discovery procedure, implemented in Lisp, capable of determining a set of ordered phonological rules, which may be in opaque contexts~ from a set of surface forms arranged in paradigms. 1. I N T R O D U C T I O N For generat ive g rammar i ans , such as Chomsky (1965), a pr imary problem of l inguistics is to explain how the language learner can acquire the g r a m m a r of his or her language on the basis of the limited evidence available to him or her. Chomsky introduced the idealization of instantaneous acquisition, which 1 adopt here, in order to model the language acquisition device as a funct ion from primary linguistic data to possible grammars , ra ther than as a process. Assuming tha t the set of possible h u m a n languages is small , ra ther than large, appears to make acquisition easier, since there are fewer possible g r a m m a r s to choose from, and less da t a should be required to choose between them. Accordingly, generat ive l inguists are interested in del imit ing the class of possible h u m a n languages. This is done by looking for properties common to all h u m a n languages, or universals. Together , these universals form universal grammar, a set of principles tha t all h u m a n languages obey. Assuming tha t universal g r a m m a r is innate , the language learner can use it to restrict the number of possible g r a m m a r s he or she mus t consider when learning a language. As par t of universal g r ammar , the language learner is supposed to innately possess an evaluation metric, which is used to "decide" between two g r a m m a r s when both are cons is tent with other principles of universal g r a m m a r and the available language data . 2. D I S C O V E R Y P R O C E D U R E S This approach deals with acquisit ion wi thout reference to a specific discovery procedure, and so in some sense the resul ts of such research are general~ in tha t in principle they apply to all discovery procedures. Still, I think tha t there is some utility in considering the problem of acquisit ion in terms of actual discovery procedures. Firstly, we can identify the par ts of a g r a m m a r tha t are underspeeified with respect to the available data . Pa r t s of a g r a m m a r or a rule are strongly data determined if they are fixed or uniquely de termined by the da ta , given the requiremen t t ha t overall g r a m m a r be empirically correct. By cont ras t , a part of a g r a m m a r or of a rule is weakly data determined if there is a large class of g r a m m a r or rule par ts tha t are all consis tent with the available data . For example, if there are two possible analyses tha t equally well account for the available data , then the choice of which of these analyses should be incorporated in the final g r a m m a r is weakly d a t a de termined . Strong or weak da t a de terminat ion is therefore a property of the g r a m m a r formalism and the da t a combined, and independent of the choice of discovery procedure. Secondly, a discovery procedure may part i t ion a phonological sys tem in an interest ing way. For instance, in the discovery procedure described here tile evaluat ion metric is not called apon to compare one g r a m m a r with another , but rather to make smaller , more local, comparisons . This leads to a factoring of the evaluat ion metric tha t may prove useful for its fur ther invest igat ion. Thirdly , focussing on discovery procedures forces us to identify what the surface indications of the various construct ions in the g r a m m a r are. Of course, this does not mean one should look for a one-to-one correspondence between individual g r a m m a r cons t ruc t ions and the surface data; bu t ra ther complexes of g r a m m a r cons t ruc t ions tha t interact to yield part icular pa t t e rns on the surface. One is then inves t iga t ing the logical impl icat ions of the existence of a par t icular cons t ruc t ions in the da ta . Following from the last point, 1 think a discovery procedure should have a deductive ra ther than enumerative structure. In par t icular , procedures tha t work essentially by enumera t i ng all possible ( sub )g rammars and seeing which ones work are not only in general very inefficient, but. also not. very insightful . These discovery by enumera t ion procedures simply give us a list of all rule sys tems t ha t are empirically adequa te as a result , but they give us no idea as to wha t propert ies of these sys t ems were crucial in their being empirically adequate . This is because the s t ruc tu re imposed on the problem by a simple recursive enumera t ion procedure is in general not related to the intr insic s t ruc ture of the rule discovery problem. 3. A P H O N O L O G I C A L R U L E D I S C O V E R Y P R O C E D U R E Below and in Appendix A I outl ine a discovery procedure: which I have fully implemented in Franz Lisp on a VAX 11/750 compute r , for a restricted class of phonological rules, namely rules of the type shown in (1).
منابع مشابه
Unsupervised Discovery of Phonological Categories through Supervised Learning of Morphological Rules
We describe a case study in the application of symbolic machine learning techniques for the discovery of linguistic rules and categories. A supervised rule induction algorithm is used to learn to predict the correct diminutive suffix given the phonological representation of Dutch nouns. The system produces rules which are comparable to rules proposed by linguists. Furthermore, in the process of...
متن کاملEvolving Phonological Rules Using Grammatical Evolution
Phonetic transcription is a core procedure of continuous speech recognition systems and speech synthesizers. The more correct phonetic translation is the more successful applications using phonetic transcription are. Phonological rules translate text to graphemes. These rules significantly reduce size of databases with words and it’s phonetic transcription and speeds up transcription process. T...
متن کاملPriors in Bayesian Learning of Phonological Rules
This paper describes a Bayesian procedure for unsupervised learning of phonological rules from an unlabeled corpus of training data. Like Goldsmith’s Linguistica program (Goldsmith, 2004b), whose output is taken as the starting point of this procedure, our learner returns a grammar that consists of a set of signatures, each of which consists of a set of stems and a set of suffixes. Our grammars...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کاملThe Introduction of a Heuristic Mutation Operator to Strengthen the Discovery Component of XCS
The extended classifier systems (XCS) by producing a set of rules is (classifier) trying to solve learning problems as online. XCS is a rather complex combination of genetic algorithm and reinforcement learning that using genetic algorithm tries to discover the encouraging rules and value them by reinforcement learning. Among the important factors in the performance of XCS is the possibility to...
متن کامل